A Traceable Data Fusion Based on Data Provenance
نویسنده
چکیده
Data fusion is a hot topic in data integration which at least includes the two stages: entity resolution and data conflict resolution. However, the existing fusion process is transparent and the fusion stages are isolated. So in this paper, we proposed a traceable data fusion mechanism based on data provenance which can trace the data sources of fusion results and the evolutionary process. The mechanism mainly targets forwards entity resolution and data conflict resolution stage. We represented the provenance of data origin using PI-CS which is more accurate because PI-CS can record the intermediate process of data evolution. In order to record the evolution process of data fusion, we proposed two transformation provenances: entity resolution provenance and data conflict resolution provenance which record respectively the evolution process of entity resolution and data conflict resolution. Finally, we give an example to validate the availability of the traceable mechanism for data fusion.
منابع مشابه
Incremental Data Fusion Based on Provenance Information
Data fusion is the process of combining multiple representations of the same object, extracted from several external sources, into a single and clean representation. It is usually the last step of an integration process, which is executed after the schema matching and the entity identification steps. More specifically, data fusion aims at solving attribute value conflicts based on user-defined ...
متن کاملData provenance tracking as the basis for a biomedical virtual research environment
In complex data analyses it is increasingly important to capture information about the usage of data sets in addition to their preservation over time to ensure reproducibility of results, to verify the work of others and to ensure appropriate conditions data have been used for specific analyses. Scientific workflow based studies are beginning to realize the benefit of capturing this provenance ...
متن کاملA New Method for Multisensor Data Fusion Based on Wavelet Transform in a Chemical Plant
This paper presents a new multi-sensor data fusion method based on the combination of wavelet transform (WT) and extended Kalman filter (EKF). Input data are first filtered by a wavelet transform via Daubechies wavelet “db4” functions and the filtered data are then fused based on variance weights in terms of minimum mean square error. The fused data are finally treated by extended Kalman filter...
متن کاملUsing Provenance to support Good Laboratory Practice in Grid Environments
Conducting experiments and documenting results is daily business of scientists. Good and traceable documentation enables other scientists to confirm procedures and results for increased credibility. Documentation and scientific conduct are regulated and termed as “good laboratory practice.” Laboratory notebooks are used to record each step in conducting an experiment and processing data. Origin...
متن کاملQuery Processing with Materialized Views in a Traceable P2P Record Exchange Framework
Materialized views which are derived from base relations and stored in the database are often used to speed up query processing. In this paper, we leverage them in a traceable peer-to-peer (P2P) record exchange framework which was proposed to ensure reliability among the exchanged data in P2P networks where duplicates and modifications of data occur independently in autonomous peers. In our pro...
متن کامل